Mori. Not. R. Astron. Soc. 000, 000-000 (0000) Printed 10 November 2010 (MN WF&L style file v2.2) 

Optimal linear reconstruction of dark matter from halo 
catalogs 



(N 

> 

O 



Yan-Chuan Cai*, Gary Bernstein and Ravi K. Sheth 

Department of Physics and Astronomy, University of Pennsylvania, Philadelphia, PA 19104 
Center for Particle Cosmology, University of Pennsylvania, Philadelphia, PA 19104 



10 November 2010 



OO 
O 

u 

6 



(N 
> 

O 
O 

in 

en 

r^ 

o 
o 



X 



ABSTRACT 

We derive the weight function w(M) to apply to dark- matter halos that mini- 
mizes the stochasticity between the weighted halo distribution and its underlying mass 
density field. The optimal w(M) depends on the range of masses being used in the 
estimator. While the standard biased-Poisson model of the halo distribution predicts 
that bias weighting is optimal, the simple fact that the mass is comprised of halos im- 
plies that the optimal w(M) will be a mixture of mass-weighting and bias-weighting. 
In iV-body simulations, the Poisson estimator is up to 15 x noisier than the opti- 
mal. Implementation of the optimal weight yields significantly lower stochasticity than 
weighting halos by their mass, bias or equal weighting in most circumstances. Optimal 
weighting could make cosmological tests based on the matter power spectrum or cross- 
correlations much more powerful and/or cost-effective. A volume- limited measurement 
of the mass power spectrum at k = 0.2/i/Mpc over the entire z < 1 universe could 
ideally be done using only 6 million redshifts of halos with mass M > 6 x 10 13 /i _1 Mq 
(1 x 10 13 ) at z = (z = 1); this is 5x fewer than the Poisson model predicts. Using halo 
occupancy distributions (HOD) we find that uniformly- weighted catalogs of luminous 
red galaxies require > 3x more redshifts than an optimally- weighted halo catalog to 
reconstruct the mass to the same accuracy. While the mean HODs of galaxies cho- 
sen to lie above a threshold luminosity are fortuitously very similar to the optimal 
w(M), the stochasticity of the halo occupation degrades the mass estimator. Blue or 
emission- line galaxies are ~ 100 x less efficient at reconstructing mass than an optimal 
weighting scheme. This suggests an efficient observational approach of identifying and 
weighting halos with a deep photo-z survey before conducting a spectroscopic survey. 
The optimal w(M) and mass-estimator stochasticity predicted by the standard halo 
model for M > 10 h Mq are in reasonable agreement with our measurements, with 
the important exceptions that the halos must be assumed to be linearly biased samples 
of a "halo field" that is distinct from the mass field. Halo catalogs extending below 
lO 12 /i _1 M0 are more stochastic than the halo model predicts, suggesting that halo 
exclusion or other effects violate the assumption that halos sample this "halo field" 
via a Poisson process. 
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1 INTRODUCTION 

One challenge in survey cosmology is to infer the true 
dark matter density field from observables, i.e. galaxies 
and galaxy clusters from galaxy surveys in the UV, vis- 
ible, NIR, or radio, or ga laxy clusters detected in X -ray 
or Sunyaev-Zeldovich (SZ) l|Sunvaev fc Zeldovichlll972l ) sur- 
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veys. All these observable populations, as well as their host 
dark matter halos, are biased tracers of their underlying dark 
matter. It is well know that the bias depends on halo prop- 
erties, e.g. mass, formation time, etc. Moreover, the trac- 
ers are not deterministic, even on large scales, i.e. there is 
randomness in their clustering relative to that of the dark 
matter. Understanding such randomness, or stochasticity, 
is crucial for precise mass reconstruction and strengthening 
constraints in cosmological parameters from observables. In 
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this work, we take a step back by assuming that we have 
perfect "observations" of dark matter halos and we aim to 
understand the stochasticity between halos and dark mat- 
ter, and to develop an optimal mass estimator from halo 
catalogs. 

The simplest assumption about the distribution of ha- 
los (or galaxies) is that they are drawn from the mass 
distribution in a biased Poisson process. This remains a 
common assumption, even though mass conservation ar- 
gume nts strongly suggest t hat this model cannot be cor- 
rect (ISheth fc Lemsonlll999r ). On the other hand, the con- 
sequences of the simple truism that the mass is the 
mass-weighted sum of all halos are only just beginning 
to be worked out. lAbbas fc Shetbl J2007t) showed that 
the usual expressions for halo bias ( Mo fc White! 1 19961 ; 
ISheth fc Tormenl 120021 ) - which derive from the fact that 
halo abundances are top-heavy (i.e., massive halos are over- 
represented) in denser regions — can also be derived from 
noting that a region which happens to have a top-heavy 
mass function will also be overdense. 

The two approaches provide rather different prescrip- 
tions for how to construct an estimator of the mass field 
given a (subset of) the halos. The estimator assuming halos 
are Poisson sampled from the mass is conside rably noisier 
than the one in which halo s are mass weighted (jSeliak et al.l 
120091 ; lHamaus et"afl |2010| . H10). In what follows, we will 
compare the differences between these two approaches, as 
well as provide a prescription for finding the optimal weight 
when only a subset of the halos are available. 

In many respects, our approach is similar to that of H10. 
However, while they concentrate on the problem of writing 
a weighted halo field as a linear function of the mass field, 
we will focus on the inverse problem - that of writing the 
mass field as a weighted sum over the halos, and minimizing 
the RMS residual E of the mass estimation. 

This re-evaluation of the stochasticity in mass estima- 
tors is important for observational programs to constrain 
cosmological parameters via measurement of the mass power 
spectrum: the weighting scheme that minimizes stochastic- 
ity may suggest a change in targeting strategies for spectro- 
scopic surveys, and will reduce the resources required for a 
volume-limited measure of the mass power spectrum. Other 
cosmological probes and tests of general relativity require 
cross-correlation of the estimated mass field with other ob- 
servables such as gravitational lensing. These experiments 
should gain even more through the use of optimal mass re- 
construction. 

Section[2]describes previous work on how to use halos to 
estimate the mass distribution, when halos are a sampling of 
the mass field, and then contrasts these with the prescription 
which follows from assuming that the mass is mass- weighted 
halos. Section [3] presents various tests in simulations of this 
approach. Section U discusses implications of our findings, 
and a final section summarizes. 



2 BIAS, STOCHASTICITY, AND LINEAR 
ESTIMATORS 



spatial cells, or Fourier coefficients of the fluctuation fields. 
Their covariance matrix can always be written as 



C = 



rfevar-P 

-'var *■ 



(i) 



r is the correlation coefficient between m and h, and b v 



is called the "variance bias" by iDekel fc Lahavl (|l999T) ■ 
Throughout this paper, the variable P without subscript 
will mean the power in the mass distribution. We can quan- 
tify the error in any estimator rh of the mass variable via 

Urn, — m) 2 } 



E A 



(m) 2 



(2) 



We will refer to E as the stochasticity between the two vari- 
ables. We can extend the definition to mean the RMS resid- 
ual error after subtraction of any estimator m of the mass 
field. If we restrict the estimator rh to be a linear function 
of h, i.e. rh — wh for some weight w, then substitution into 
^ yields 

E 2 = (1 + w 2 + 6 2 ar - 2«;r6 var ) . (3) 

E is minimized at w = r/6 V ar, which yields 



E 2 pt = 1 - r 2 = 



[m 2 )(h 2 )' 



(4) 



In many cases cosmological information is carried by 
the ratio b between the two variables. If m and h are drawn 
from a bi-variate Gaussian distri bution, then w e can use 
the Fisher matrix formalism from lTegmark et al.l (|1997r i to 
form the covariance matrix of the parameters {P, r, fe var } in 
C. If we draw N m pairs (m,h) from the distribution, the 
uncertainties on 6 var and P after marginalization over other 
parameters are 



crp 
P 



°"t\- 



E op 



^opt Op 



(•») 



(6) 



6var V N m ^/N^ V 2 P 

[See lSheth et al.l (120051 ) for the real space version of this ar- 
gument.] Note that the same stochasticity E appears in at |j 
The appeal of cosmological tests based on ratios like 6 va r 
instead of power variables like P is that the former require 
E 2 pt /2 fewer mode measurements to reach the same accu- 
racy and can hence be more powerful for a given survey. We 
will show below that this factor can reach E 2 pt /2 < 10 -3 for 
estimation of the mass distribution from halo distributions. 
In this paper we will adopt a covariance definition of 
bias for variable h against m, in which the covariance matrix 
is written as 



P bP 

bP b 2 P + N 



(') 



The "noise" component is M > 0. Note that our P is explic- 
itly the mean power spectrum in realizations of the field m. 
We do not subtract shot noise. 



From (T3J), the stochasticity of h and m is E, 



opt 



Suppose we have two random variables, m and h, both de- 
fined to have zero mean. We will later be interested in cases 
where these are the mass and halo density fluctuations in 



1 ISeliak fc War ren (2 o"oll and lBonoli fc Pen! J2009h write a b /b = 
S = y2(l — r) per mode. The difference between this and the 
correct expression \/l — r 2 is small when r is close to 1. 



Optimal mass reconstruction 3 



JV/{b 2 P +JV) = (1 + tfP/N)- 1 . If h is a Fourier coeffi- 
cient of the fluctuations in a biased Poisson sampling of the 
field ra, with mean space density n, then M = 1/n. For any 
field h, therefore, we can define an effective space density 
via 



(nb 2 ) cfi P = E~ 2 t - 1. 



(8) 



Recently, H10 concentrated on optimization of the 
quantity 



(s w — s m y 



(9) 



where S w is a weighted halo map and 8 W is a linear function 
of the mass fluctuations. Our approach (and motivation) is 
similar, although we focus on the inverse problem - that of 
writing the mass field as a weighted sum over the halos, and 
minimizing the RMS residual E of the mass estimation. 



2.1 Multidimensional linear estimators 

Now consider attempting to estimate the mass fluctuation 
8m from the fluctuations Si of a collection of tracers, e.g. the 
fluctuations in bins of halo mass. We wish to construct an 
estimator S m from a linearly weighted sum of the tracers: 



y^w»<?» 



(10) 



With complete generality we can define the power P, the 
covariance bias vector b and the halo covariance matrix C 



(8l) 
{5 m 5i) 



P, 
biP, 



(ii) 

(12) 
(13) 



Note that we have not subtracted a shot noise contribution 
from the halo variance. We will never subtract a Poisson shot 
noise term from the power spectra in this paper because it 
is our intention to test the simple assumptions about the 
nature of shot noise. 

For any choice of weight vector w, the stochasticity of 
the resulting mass estimator is 



E 1 



2b T w 



w T Cw/P. 



(14) 



The choice of w that minimizes the stochasticity is 

wopt^ic/py 1 ^ (is) 

making 



E^^i-^ic/py 1 ^ 



(16) 



Note that this is the stochasticity-minimizing linear estima- 
tor quite generally, independent of any assumptions about 
Gaussianity or any details of the process generating the halo 
distributions. Linear estimators which minimize the RMS 
residual of a target are known as Wiener filters, and our 
form w — C _1 b is typical for such cases. Also it is gener- 
ally true that the optimized mass estimator <5 op t will satisfy 

(<5opt) = (Sopt&m)- 

The weight Eq. (jTSJ is proportional to that in eqn. (19) 
of H10, even though their derivation assumes Gaussianity 
and is not based on minimizing the stochasticity E. 



2.2 Principal components 

iBonoli fc Pen! (120091 ) investigate the stochasticity of the halo 
field with respect to the matter by taking the weight function 
to be the first (or higher) principal component (PC) of the 
halo covariance matrix C. In other words the weight w is the 
eigenvector of C having the largest eigenvalue. If the St are 
rotated into the principal components, then the matrix C 
becomes diagonal. If principal component j has correlation 
coefficient rj with respect to the matter, then it is easy to 
see that 



E, 



opt 



i-£ 



2 



(17) 



Hence a drawback of PC weighting is that the stochasticity 
of the first principal component (PCI) achieves the opti- 
mally low value on l y if n o other PC's correlate with the 
mass. IBonoli fe Pen! (|2009r ) show that this is a good approx- 
imation only on the largest scales. 

Another issue with PC weighting is that it is not stable 
to re-binning of the halo population. For example, when the 
halos occupy the mass distribution with a Poisson process, 
the bins must be chosen with equal m in order for the first 
PC to dominate the correlation with the mass (as was done 
by H10). The optimal weighting Eq. (|30|) shown later in our 
paper is recovered only in the limit of vanishing shot noise, 
riiP — > oo. Since PC weighting is binning-dependent and 
non-optimal, we will not focus on it. 

H10 find that the optimal weight vector is very close to 
the weakest principal component of the "shot noise matrix" 
C — bPb T when the halos are binned in equal numbers. This 
is intriguing since there is no algebraic requirement for the 
correspondence . 



2.3 Mass as mass-weighted halos 

2.3.1 Mass completeness relation 

If we partition all of the mass into halos, i. e. extend the halo 
catalog to zero mass, then the mass distribu tion is the mass- 
weighted sum of the halo distributions fe.g-. lAbbas fc Shethl 
2007). Hence we will obtain a perfect E — estimator of 
mass if we weight each halo bin by the fraction of the total 
mass it holds: 



Vi = 



Uirrii 

J2i n i m i 



(18) 



where m; is the mean mass of halos in bin i and p is the over- 
all mean density. Since E — is clearly the optimal result, 
it is mass weighting which is optimal. If, on the other hand, 
halos occupy the mass distribution via a biased Poisson pro- 
cess, it is optimal to weight halos by their bias factors (we 
show this explicitly below). Therefore, the biased Poisson 
model cannot be correct in the limit that the halo catalog 
includes all of the mass. 

The simple truism that the mass is the mass-weighted 
sum of all halos suggests that the optimal estimator will 
ten d toward mas s weighting as we include lower-mass ha- 
los. IPark fc Choil (|2009| ) note in N-body simulations that 
mass-weighted hal o catalogs attain low er stochasticity than 
uniform weighting. ISeliak et al.l (J2009J) show that weighting 
by mass (or other functions of mass) produces significantly 
lower stochasticity than expected from Poisson shot noise. 
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H10 derived that weight function which optimizes a%,. We 
derive the analogous optimal weight for E. 

2.3.2 Mass estimation with incomplete halo catalogs 

Suppose we are given a list of halo masses and positions. 
Suppose that this list is complete down to some limiting 
mass nid. We can assign the remainder of the mass to a 
"dust bin," which contains the fraction T]d = 1 — ^ Vi °f 
mass that is not in the halos. 

If Sd is the relative density fluctuation field of the mass 
in the dust bin, we can define the power of the dust field as 



Pd = (S 2 d ) 
and a bias vector Cd between halos and dust by 

C'di _ {SdSi) 



<-di 



Because 



5m = VdSd + V, V'Si, 



(19) 



(20) 



(21) 



the bias and optimal weighting for the halo bins against the 
mass are: 



P = riJPd + 2ri d Pd'n Tc d + »7 T C?7, 
Pb = ri d P d Cd + Cr], 

w opt = rj + (rjdPd)C~ 1 c d , 



E, 



opt 



p/p d 



1 - cJC 1 c d P d 



(22) 
(23) 
(24) 

(25) 



In this formulation it is apparent that as our halo catalog 
extends to lower masses and r\d —> 0, the optimal weight 
Wopt — ¥ r], i.e. mass weighting, and E 2 — > 0. The further 
question of interest for finite r]d is: How well do the known 
halo fluctuations Si predict the dust bin density Sd? 

2.4 Sampling Models 

So far the analysis has been completely general as to the 
generation of the mass field and the designation of halos 
within it. Now we examine models for the relation between 
halos and mass. 

The most common assumption about the distribution 
of halos (or galaxies) is that they are drawn from the mass 
distribution in a biased Poisson process. If we are examining 
the Fourier coefficients of the mass and halo distributions, 
the biased Poisson model can be broken down into three 
assumptions: 

(i) The halos are a linearly biased sampling of some con- 
tinuous "halo field" 5% with power (6%) = Ph, via some 
stochastic process that has no spatial correlations. Then the 
covariance matrix of halo bins can be written as a rank-one 
matrix plus a diagonal "shot noise:" 

C = P h vv T + di&g(Af), (26) 

Mi = fi/m. (27) 

Here /, > is a "clump size" factor relating the noise Mi in 
bin i to the space density m of sources in the bin. Halos in 
bin i are drawn with bias Vi from the halo field. 

(ii) The halos are placed by a Poisson process so that 
ft = I- 



(iii) We identify the halo field Sh with the mass S m such 
that Ph — P and v = b. In this case C = Pbb T + diag(l/n;). 

Assumption (iii) of the biased-Poisson model has been 
noted to be inconsistent with the assumption that the halo 
catalog can be extended to comprise all of the mass. We 
will therefore consider models in which assumption (i) holds 
without (iii), such that the halos sample a field that is dis- 
tinct from the mass distribution. We will take care therefore 
to distinguish v, the bias of the halos with respect to the 
halo field Sh, from b, which we always define via the covari- 
ance with mass as per Q. 

The assumptions that halos are linearly biased, and that 
the halo generating process has no spatial correlations are 
idealizations: in fact, halos in simulations do not overlap 
(almost by definition), and their bias is non- linear. We will 
return to the limits of these assumptions later. 

When C takes the form (|26[) . two things are of note: 
first, this description is stable under re-binning of the halos 
in the limit of narrow bins. More specifically, if Vi and fi 
are slowly- varying functions of the mass rm of halos in bin 
i, then the Vi and fi do not change if two adjacent bins 
are merged. In other words we can write functions v(m) 
and f(m), and all of our linear-algebra formulations can be 
carried over into integrals over halo mass m. The second 
useful fact about (|26|) is that it can be inverted analytically 
using the Sherman-Morrison formula: 



Mc-V = 



%%0%2 



1 + £<%«? 

Phi Mi = niP h /fi. 



(28) 
(29) 



When all three conditions of the biased-Poisson model 
are met, the optimal weight function and stochasticity are 
found simply from (|15p and 



b, 



E 



opt 



(nb 2 



l + ^mbfP' 
l + J2nib 2 iPy\ 

^nib 2 . 



(30) 

(31) 
(32) 



This recovers the result from lPercival et al.l (J2004J) that the 
optimal linear mass estimator in a Poisson model weights 
each halo by its bias bi (times a mass-independent fac- 
tor), and the stochasticity of the estimator is determined 
by ~^2riib 2 P. Conveniently the weights scale with the bias, 
independent of the range of halo masses included in the es- 
timator. This property does not hold for more general forms 
of C and b, e.g. it fails when »/li and condition (iii) is 
violated. 

In this paper we will not assume that the halos occupy 
the matter distribution via a biased Poisson process. We will 
examine the C matrix for halos in numerical simulations, 
examine what if any of the three biased-Poisson conditions 
actually holds, and then use the general formulae (|15[) and 
(|16|l to find how the optimal stochasticity differs from the 
Poisson predictions. 

2.4-1 General sampling model 

We now examine the case where all halos, and the dust, 
are indeed placed by a local process that is biased relative 
to some halo field Sh, so assumption (i) holds but (ii) and 
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(iii) are not assumed. This model illustrates the difference 
between the halo field that is sampled and the mass field 
that the halos comprise. The two fields cannot be equiva- 
lent, even in the linear regime. In £13.71 we examine whether 
halo covariance matrices measured in simulations are in fact 
consistent with this sampling model. 

If all the halos and dust are placed in this halo field by 
independent processes, and we define a weighted field 

J2i n i V i8i 



s v 



Yji n i V i 



(33) 



then the covariances between the dust, the halo bins, and 5 V 

are 



C = vv P h + diag(/i/n; 

Pd = C dd = v d P h +JV d 

Cdi — V d ViP h 

r - „ Si UiV * p 

Wu — Vd-=^ r'h 



Oui; — 



I], UiVi 



2\ 2 



Ph + 






(34) 
(35) 
(36) 

(37) 
(38) 



We have freedom in setting the normalization of v d and 
v which we do by requiring 



r] d v d + rj-v = l. 



(39) 



In this case the mass, which is the sum of halo and dust 
densities, has power 

P = C mm = Ph + Vd-Md + ^Vifi/ni (40) 

= P h + VdM d + (l-Vd) 2 (fm 2 )/(m) 2 (41) 

= P h +M m - (42) 

The angle brackets denote number-weighted averages over 
the halo population, and the final expression defines Mm.. 
We see that when the mass field is a sampled realization of 
the halo field, then P = Cmm is larger than Ph by terms 
representing the sampling shot noise. 

The bias b of the halo mass bins relative to the mass 
distribution will not in general equal the bias v with respect 
to the halo field. Since biP = (SiS m ), we can expand S m 
using 1)210 . and use the C elements above, to derive 



b = 



v + dia,g(mj fj/Php) 
P/Ph 



(43) 



Formally, only Vi oc firrii will yield b oc v. In general, bi > Vi 
at sufficiently large masses, and bi < Vi at lower masses. 
Equality is at rrn — biM m p, which occurs for z — at ~ 
10 14 h~ 1 M e - In a ACDM model, the b = v crossover will 
occur for halos with 6 w 1.6 for a wide range of redshifts. 
We can also solve for the weight vector of the optimal linear 
mass estimator: 



= — + rj d v d (Vi/fi) Ph E poU 

m p 



(44) 



where we have set -Ep i s = (1 + E n i v ^Ph/ ,fj) ', the sum 
over j is over the halo bins in the catalog. -Ep ois describes 
the fidelity with which the halos can estimate the halo field, 
as opposed to the mass field. Notice that the first term is 
the mass weighting (w oc m), and the second correction 
term has the same form as that for the standard Poisson 
model (w oc v), but its importance depends on the mass 



fraction in dust, and how it clusters. Thus, equation ((44} 
is a weighted sum of equations (|18[) and (|30|l . As the halo 
catalog comprises more of the total mass, r\ d — > 0, and mass 
weighting becomes optimal. 

If we define S opt = Ej^opM^j then the optimized 
stochasticity is 



E, 



1 _ (SoptSm) _ . (SoptSm) 



(s 2 opt )m 



(SI) 



r) d C dd 



(-'771771. \ {-^dd^VV 

(AA d + v 2 d P h El oi! 



vi 



(45) 
(46) 
(47) 



We have written the second equality explicitly to show that 
S op t makes physical sense: since y d Cd d /Cmm is the fraction 
of the total C mm that is in dust, the stochasticity is the yet 
smaller portion of this dust power that cannot be recovered 
via the correlation between the dust and the bias weighted 
halos. 

The final expression shows that E opt -> as the mass 
fraction in dust r\ d — > 0, as it should. When Ph — > 0, then 
Eopt ~^ ridN d /Nm'- in this limit, the stochasticity is deter- 
mined by the fraction of the noise term Mm which is con- 
tributed by the dust. The opposite limit is when Ph 3> Mm, 
where P w P h and E opt — > rj d v d E po [ s . Since r\ d v d < 1, this is 
why the optimal stochasticity can be substantially smaller 
than in the Poisson model. 

The optimized weights and relations between b and v 
are similar to what H10 found in their analysis of a w . E.g., 
our equation (|43[) reduces to their equation (41) upon taking 
fi — s> 1, Ph — >• Pun, and Vi — > bi. In the discussion following 
their Eq. (36), H10 note that their optimal weight is a linear 
combination of mass and bias weighting, a point they make 
again with their Eq. (49). But our expression for the rela- 
tive contributions of these two weights is substantially more 
transparent than theirs. For instance, our formulation shows 
that this factor is, in fact, the one associated with the usual 
Poisson-sampling model, times a factor which accounts for 
the dust - this is not obvious from their expressions. 



2.5 The halo model description 

The halo model is a specific case of the sampling model in the 
previous section. The halo model is particu larly well-su ited 
to describing the effect of weighting halos (jShethl 120051 ): it 
predicts not only C, but also the dust-bin quantities like M d 
and P d which are not observable and were left unspecified 
in the previous section. Hence the halo model allows an es- 
timate of the stochasticity E w associated with any weight 
function w applied to the halos, so it can be used to estimate 
E op t. (In the context of the optimal weight discussed earlier, 
it provides a prescription for the effect of bias weighting ha- 
los.) 

In what follows, we will explicitly set / = 1; comparison 
of the predictions of this calculation with the measurements 
in simulations provides a measure of the accuracy of this 
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assumption. In particular, if we define 



dm — — w(m), 
dm 



M w -- 

My. ~- 
Mm ~- 

then 

*~Jwm 

where 



dm 
dm 

din 



dn w 2 (m) 
dm n\, 

dn mu(k\m) w{m) 
dm p n w 

dn m 2 \u(k\m)\ 2 
dm p 2 



v 2 w P h (k)+M w , 
v w P h (k)+M x , 



dn w(m) 

dm — vim). 

dm n w 



(48) 
(49) 
(50) 
(51) 

(52) 
(53) 

(54) 



Here v(m) is the bias with respect to Ph(fc); it is related to 
the bias b(m) with respect to the mass field by equation (|43[) . 
The factor u(k\m) in the expressions above represents the 
fact that halo catalog is treated as if all mass is concentrated 
at the center of mass, but real halo mass is smeared in a 
density profile: u is the Fourier transform of the density 
profile, normalized so that u — > 1 as k — > 0. 

Strictly speaking, this writing of the halo model is not 
quite correct, because halos of a given mass may have a 
range of density profiles, i.e. u is not the same for all halos, 
and stochasticity in u will contribute to E. If we use the 
mean u for a given k in the expressions above, and define 
a 2 t (k\m) = (\u 2 \) — (it) 2 , then then the scatter in profile 
shapes will contribute an additional term 



Mm -> Mm + 



dm— — a u (k\m) I — 
dm \ p 



(55) 



We expect this additional term to be unimportant at the 
scales k < O.lfeMpc - 1 tha t are of most interest for cos- 
mologv lSheth fc Jainl (J2003h . Section [3.61 shows that this is 
indeed the case. However, this term grows as k 4 , so it could 
dominate the stochasticity at higher k. 

The results of the previous section suggest that an opti- 
mal linear reconstruction of the mass can be obtained if we 
use equation (|44fl for the weight function. If we set fi — 1, 
then 

v{m)P h {k) 



Wopt (m) 



mu{k\m) 



■F v 



l + (nv 2 ) h P h (k)- 



where 



F r 



1 



m,, 



' dn m u(k\m) , . 

dm — ^-i — - vim) 

dm p 



dm — — v (m). 
dm 



(nv ) h — 

This makes the stochasticity 



E"„ 



(-1, 



^mm^iDW 



n-w t^u 

*~"mrr 



(56) 

(57) 
(58) 

(59) 



equal the expression given in equation fl45[) . (Note that, for 



T^w ^u. 



and E Wopt is 



these choices of w and v, n w C ww 
independent of u.) 

Thus, we can use the halo model to estimate the 



stochasticity E w associated with the weight w of equa- 
tion ([56} as follows. We can measure the mass power spec- 
trum P and the covariance bias b(m) in the simulations, 
and then use equation (|43[) to infer v(rn)Ph- We then use 
the halo model to estimate Mm, which we subtract from the 
measured P to get Ph, and hence v(rri). These can then be 
used to estimate F v and (nv 2 )h by summing over the halos. 
The halo model can also be used to estimate Mm', by directly 
measuring the component of this that comes from the halos 
in the sample, one can determine Md and so estimate E w . 
In the following section, we will compare this Poisson sam- 
pled and mass- weighted halo model for E w with the optimal 
stochasticity in simulations. 



2.5.1 Halo exclusion and other subtleties 

We could have made heavier use of the halo model as 
follows. The usual implementation l|Shethl 120051 ) replaces 
v(m) Ph — > b v bs( m) P m , where bybs(m) is the peak back- 
ground split bias (Bardeen et al. 19861 ; Cole fc Kaiserlll989i ; 



iMo fc White! 1 1999 : ISheth fc Tormenlll999l . e.g.). aiid P m is 
the power spectrum of the mass, usually approximated by 
Pn n at small k. H10 make this same assumption in their 
halo model of a w . We will show later that b p bs(m) appears 
to be closer to v(m) — (Chm — M x )/(Cmm — Mm) than it 
is to b(m) — Chm/Cmm- However, note that our discussion 
indicates that Ph is not to be identified with the mass power 
spectrum and v(m) is not the same as the linear bias factor 
between the halo and mass fields. This is one reason why 
our construction of the halo model above was slightly dif- 
ferent from standard. In particular, we did not begin from 
the mass field 8 m , and immediately set Ph = Pm, as is 
usually done. Rather, we framed our discussion in terms of 
the field 5h, and weighted samplings of it. Because we as- 
sumed that halos were linearly biased Poisson samplings of 
this field, we explicitly ignored the fact that, in reality ha- 
los are spatially exclusive, and the sampling function is not 
just a linear function of the mass. The exclusion property 
means that the assumption that the halos are obtained from 
independent sampling processes for every mass bin cannot 
be correct. Indeed, in t he sampling algorithm described in 
l|Sheth fc Lemsonlll999h . this lack of independence appears 
explicitly - and it also contributes to the non-linearity of the 
bias relation (see their equation 17). [For more recent dis- 
cussion of the effects of halo exclusion and non-linear bias 
on P m , and another way of seeing why exclusion alone can 
produce effects which appea r as scale dependent non-linear 
bias, see ISmith et al.l (J2007J).] As we shall see, our neglect 
of these effects sets a limit to the accuracy of our approach 
(e.g., the contribution to the stochasticity which comes from 
bias- weighting the halos may not be optimal). 



2.5.2 Halo-mass-dependent selection function 

The results above assume that all halos above a sharp 
threshold in mass are observed. If the threshold is not sharp, 
but is a function of mass, < p(m) < 1, then it is straight- 
forward to verify that equation (I56|l remains the optimal 
weight, provided that, in the previous expressions for C wm , 
C wm , F v and (nv 2 )h (but not Mm, of course), all occurrences 
of (dn/dm) are replaced by (dn/dm)p(m). As a result, the 
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Table 1. Basic parameters used in the Millennium and NYU simulations, e is the Plummer-equivalent comoving softening length of the 
gravitational force; N TC is the number of realizations for each simulation; z s tart is the starting redshift for the simulation. 



Name 


N p 


M P 


^box 


e 


a l in 


o A 


n 6 


^8 


n s 


H 


%start 


JV re 


Millennium 

NYU 


2160 3 
640 3 


8.6 x 10 8 h,- x M @ 
6 x 10 11 h~ 1 M Q 


500 h~ 1 Mpc 
1280 ft.- 1 Mpc 


5h~ 1 kpc 
20h- 1 kpc 


0.25 
0.27 


0.75 
0.73 


0.045 
0.046 


0.9 

0.9 


1 
1 


73 

72 


127 

50 


1 
49 



sub-sampling decreases n w C wm ; since C mm does not change 
(of course), the sub-sampling degrades (i.e., increases) E. 



3 OPTIMAL WEIGHTS AND MASS 
ESTIMATORS FROM SIMULATIONS 

3.1 Simulations 

In this section we use N-body simulations to find optimal 
mass-estimation weights for halos binned by mass (rather 
than, e.g., angular momentum, axis ratio, formation time or 
concentration). We show the stochasticity that results from 
applying the optimal weights and compare to common sub- 
optimal choices. 

For purposes of exploration, we wish to have a wide 
range of halo masses, while still having good statistics for 
large halos and large-scale mod es. For the first purp ose, we 
use the Millennium simulation IjSpringel et al.ll2005T ). which 
resolves halos down to ~ 10 h _1 MQ. For the second, we 
use the suite of 49 cosmological dark m atter "NYU" simula- 
tions described in lManera et al.l (120101 ) ■ which have a total 
volume 800 x larger than the Millennium, but the minimum 
resolved halo mass is 1000 x larger. The simulations assume 
very similar ACDM cosmological m odels and were carried on 
using the same GADGET-2 code (|Springelll20"05h . Table [T] 
provides details of the basic simulation parameters. 

Th e initial power spectrum w as generated by CMB- 
FAST jSeliak fc Zaldarriagdll996T ). The initial density field 
of the Millennium simulation was realized by perturbing a 
homogeneous, glass-like particle distribution with a Gaus- 
sian r andom realization of the initial power spectrum IjWhitel 
1996), while the NYU simulations use an algorithm mo- 
tivated by Second Order Lagrangian Perturbation Theory 
l|Scoccimarrdl 1998T ) . Dark matter halos with at least 20 par- 
ticles are identified in both simulations using the friends- 
of-friends (FOF) group finder with a linking length of 0.2 
times the mean particle separation (|Davis et al.lll985l ). The 
lowest mass halo of the two simulations that we use are 
1.7 x lO lo /i _1 M and 1.0 x 10 13 /i _1 M Q respectively. The 
Millennium simulation has at 1.8 x 10 7 halos at at z — 0, z — 
0.5 and z = 1 while the NYU simulations have ~ 4.3 x 10 7 , 
3.5 x 10 7 and 2.5 x 10 7 halos in each of these redshift slices. 
For a more detailed analysis of the h alo mass functi o n and 
bias factors in these sim ulations, see ISpringel et al.l (|2005l ) 
and iManera et all (|201Ch . 



3.2 Measuring power spectra and covariance 
matrices in simulations 

We start our measurements by dividing halos into bins 
sorted by mass. The clustering of halos depends weakly on 
the halo mass when halo mass is low but increases rapidly 



with mass when M > 10 13 K~ 1 Mq. Furthermore halo abun- 
dances drop sharply at high mass. As we do not want a wide 
range of masses within a single bin, we include fewer halos 
per bin at high masses. 

We choose bins so that the number of halos in each de- 
creases exponentially at high masses. The optimal weighting 
and stochasticity are robust to changes in the binning, as 
long as the function b(M) is well sampled. We divide the 
halos into 10 bins for the NYU simulations, and use up to 
30 bins for the Millennium simulation. We have tested that 
using more bins does not change our results. 

Within each halo bin, we weight halos by their masses 
and assign them to a A r3 = 256 3 3D mesh of cubic 
grid cells using the cloud-in -cell (CIC) assignment scheme 
IjHocknev fc Eastwoodlll98lf) . i.e. we take the Fourier trans- 
form of the mass distribution within a halo bin. This is in 
anticipation of the result below that optimal weighting is 
closer to mass-weighting than number-weighting of halos. If 
the bins are narrow in mass, this choice of intra-bin weight- 
ing should have little effect, which we have verified. 

We separately Fourier transform the overdensity field 
of each halo bin and the total mass distribution. We correct 
each Fourier mode for the convolution with the CIC window 
function by the operation: 



5(k) = S(k) 



sin(ir) sin(j/) sin(jz) 
x y z 



(60) 



where {x,y, z) — {k x L box /2Ng,k y L box /2N a ,k z L box /2N g }, 
and N g is the number of grid cells in each dimension. For 
each bin in k we construct the covariance matrix of Fourier 
coefficients Cij(k) — (Si(k) 8j(k)), where i and j range over 
all halo bins, as well as the mass power P — (<5„), and hence 
the covariance biases b of the halos against the mass. For the 
NYU simulations we average results from all 49 realizations 
to produce a mean covariance matrix. 



3.3 Measured stochasticity and optimal weights 

Figures [T] shows the optimal weights we derive from the sim- 
ulations (black solid lines), and compares them with various 
functions of mass. Purple-dashed lines show w oc m; blue 
curves show the bias weighting w oc b that would be appro- 
priate if the standard biased- Poisson model were correct, 
and dotted curves show the optimal weight in equation (|56l) 
derived from the halo model of Section 12.51 

Clearly, neither M nor b are optimal weights. Indeed, 
the shape of w op t (M) depends on the cut-off mass M m i n of 
the halo catalogue. (This dependence follows that found by 
H10, in their study of c w .) When Af min < 10 13 /i _1 M Q , the 
massive end of the w op t(M) is close to mass weighting, as 
illustrated by the right-hand plot of Figures [T] When M is 
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Figure 1. The weight function for optimal reconstruction of the mass field on the scale k = O.lhMpc - 1 at z = as a function of halo 
mass for the NYU simulations (left) and the Millennium simulation (right). The optimal weight depends on the minimum halo mass 
used in the reconstruction; solid lines (with arbitrary vertical offset), which are measured from simulations using Eq. (14), show this 
dependence. Dashed, dotted and blue curves show w <x M, equation H56I) . and w oc 6, the latter being optimal if the Poisson model is 
correct. The optimal weighting steepens as M m i n decreases, approaching w <x M , although not exactly along the dotted curves predicted 
by our halo model implementation of the sampling model. Cyan and orange lines show the mean halo occupancy distributions (HODs) 
for blue galaxies and luminous red galaxies, respectively, in the low-z SDSS spectroscopic sample. 



1.0 



LU 

I 0.1 
en 

CTj 

sz 
o 
o 

-t—> 

CD 




k=0.1/VMpc 



k=0.2M/lpc 



10 10 10 11 10 12 10 



10 11 10 12 10 13 10 14 
M . [MJh] 

mm L © J 



1Q 11 10 12 10 13 10 i 



Figure 2. Stochasticity E of the estimators of the mass field derived from the weights shown in Figure [T] shown as a function of the 
minimum mass of halos in the catalog. The three panels show different k values, all at z = 0. In each panel, data at higher M m ; n are 
from the NYU simulations; lower M m i n measurements are from the Millennium simulation. From the bottom up: black, purple, blue, and 
red solid curves show optimal, mass, bias and uniform weighting of the halos. Bias weighting would be optimal if the standard biased 
Poisson model were correct; it clearly is not. Dashed curve shows the halo model calculation of E which assumes the mass is sum of 
halos that are Poisson-sampled from some halo field. The failure of the model at M < 10 12 h —1 MQ is discussed in the text. 
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Figure 3. Unweighted (left) and optimally weighted (right) real-space halo density fluctuations <5j, versus dark matter density fluctuations 
S m smoothed by a spherical top-hat window function with the radius of R = 50/i -1 Mph. The three contour levels (0.1, 1, 10 shown in 
green, yellow, red) indicate the relative number density of data points in the 8 m -&h plane. Diagnal black solid lines indicate <5j, = <5 m . 
Dash lines show the fitting results of the function 8^ = bg + b\8 m + &2<5m, with best-fit parameters shown in the figures. Top panels are 
results from the Millennium simulation with the mimimal halo mass of Af m i n = 1.4 X lO 1() h _1 M0. Bottom panels show results from the 
NYU simulations, with M min = 3.1 X 10 13 /i -1 M o . 



close to M m i n , however, w op t(M) is flatter than mass weight- 
ing. Moreover, the slope of w op t(M) gets shallower as M m i n 
increases, as shown in the left hand plot. Weighting halos 
by their masses is a poorer approximation to the optimal 
weight when M m j n > 10 13 h~ 1 M(? ) . The halo model predic- 
tion of the optimal weight is generally in good agreement 
with the measurements. The agreement is not perfect, how- 
ever, especially when M approaches M m i n . 



Figure[2]shows the stochasticity E associated with these 
linear estimators S m of the mass distribution, as a func- 
tion of the minimum mass M m i n of halos included in the 
sample. Black, purple, blue, and red solid curves show E 
derived from optimal, mass, bias and uniform weighting 
of the halos. Weighting halos by their masses yields lower 
E than bias weighting or equal weighting, but is signifi- 
cantly worse than the optimal when the halo catalog has 
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Figure 4. Left: The effective source density (nfe 2 ) c ff of an optimally- weighted mass estimator is plotted (solid) vs k for catalogs with 
a different halo mass cutoffs M m i n as labeled. This measure of stochasticity is found to vary little with scale in the linear regime 
k < 0.1/iMpc — . The dashed lines show the nb 2 expected via Equation II32H under a biased-Poisson model of galaxy stochasticity. The 
dotted line shows the (nb 2 ) e g required to attain E = 0.5, i.e. a volume-limited power spectrum measurement. Right: For three different 
redshifts, we plot the optimal (nb 2 ) c g vs the minimum mass of the halo catalog used for mass reconstruction (solid lines). The dashed 
lines plot the nb 2 that we would expect if halos were a Poisson sampling of the mass distribution, as per Equation (f >2t . Because halos 
comprise the mass rather than sampling the mass, the (nfc 2 ) c ff is up to 15 X higher, i.e. the mass estimator is less stochastic, than the 
Poisson model predicts. 



Mmin ~ 1O 13 /i -1 M0 or lower. Bias weighting would be opti- 
mal if the standard biased Poisson model were correct, but 
is clearly far from optimal for halos in iV-body simulations. 
The dashed curves show the halo model description of E opt 
(equation I59|) . The model agrees with the measurements at 
Af m i n > W 12 h~ 1 Mo, but it does not predict the inflection 
we measure at smaller M m i n . We discuss this further in Sec- 
tion [321 

As an additional check of our numerical methods, we 
have verified that inclusion of the "dust bin" in the estimator 
leads to a perfect mass estimator (E = 0) with optimal 
weights directly proportional to halo mass. 



3.4 The scatter between the halo field and the 
mass field 

To illustrate the gain from applying the optimal weights, 
we show in Figure [3] the scatter between the fluctuations 
of the halo field 8 m and the mass density field dm, be- 
fore and after applying the optimal weights. Notice that 8u 
and S m axe both density contrasts in configuration space 
that are smoothed by the same spherical top-hat window 
function. We do the smoothing by multiplying the den- 
sity contrasts in Fourier space with the Fourier transform 
of the window function, 6h.,m(k) — Wn(k)8h,m(k), where 
W R (k) = 3[sin(fc_R) - kRcos(kR)]/(kR) 3 and R is the ra- 
dius of the window function. Then we Fourier transform 
back and get the smoothed S h and 8 m . 

We fit the scatter plots with the polynomial function 
5h — bo+biSm+bzSm, to see if there is any indication of non- 



linear bias factor 62 • We usually find very small fitted values 
of 62, especially for the optimal weighted cases. We also find 
an increase of 62 value when increasing the low mass cut of 
the halo sample. In general, we see a significant improvment 
of applying the optimal weights, indicated by the shrinking 
of the scatter. This shows that the optimal weights indeed 
work well, without any higher-order bias correction. 

3.5 Scale dependence of stochasticity 

Figure U illustrates that the optimal (n6 2 ) c ff is nearly in- 
dependent of k at fixed A/ m i n in the linear regime, where 
(n& 2 ) c ff is related to E opt by equation (f5|. Since both (nb 2 ) c ff 
and the Poisson prediction (nb 2 ) (the simple bias weighted 
sum over the halo population), are nearly constant across 
the linear regime, we compare them in the right panel of 
Figure [4] as a scale-independent measure of stochasticity. 
We find that at all redshifts and M m i n values, the achiev- 
able (n6 2 ) c ff is significantly better (higher) than would have 
been expected in the model where halos are a Poisson sam- 
pling of the mass. The ratio (nb 2 ) e g/(nb 2 ) can be as high as 
« 15 for surveys of M > 10 12 /i _1 M Q halos at z = 0. Even 
for surveys limited to massive clusters, M m i n w W 14 h~ 1 MQ, 
the effective source density of the optimal estimator is « 2x 
better than the Poisson model predicts. 



3.6 Departures from the halo model 

The dashed curve in Figure[2]plots the halo model prediction 
of the optimal E, which assumes the mass is comprised of ha- 
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los that are Poisson-sampled from some halo field. The E op t 
from simulations becomes shallower for M m i n < 10 12 h~ x Mq 
where as the halo model does not. Either the halo model is 
not accurate at M m i n < 10 12 /i _1 Mq, or there is some bias 
in the numerical estimation of E op t from the simulation cat- 
alogs. 

Calculating E% pt at A/ m i n < 1O 12 /i _1 M0 sets heavy de- 
mands on the measurement of the covariances in the simu- 
lation, because b T C~ 1 b must be calculated to a fractional 
accuracy of -B 2 pt < 10~ 3 in this regime. We have considered 
the possibility that E op t levels off at small M m i n because 
of discreteness effects. Specifically, in the simulations, mass 
comes in units of m p , so the "dust" is not made of arbitrar- 
ily small halos. This makes Md — > Md + 1/rid approximately. 
However, this additional factor is too small to explain the 
flattening we see. We have also verified that the plateau in 
E op t is unaffected by the size of the bins in mass or k. 

As noted earlier, the stochasticity E will be degraded if 
halos of a given mass have varying internal structure, while 
we treat halos of a given mass as being identical point masses 
in the analysis. Within the halo model, the stochasticity 
adds to Mm as per (|55[) . We test the magnitude of this effect 
by creating new halo overdensity maps from the full sam- 
ple of iV-body particles belonging to halos in each mass bin. 
These "true" halo mass maps are then used to create an opti- 
mal mass estimator. Figure [5] shows that the point-mass ap- 
proximation has negligible impact on E at k < O.f 5h Mpc -1 , 
but at higher k the mass estimator is increasingly degraded 
by the absence of information on the variability of internal 
structure of massive halos. 

Even when we use the full halo mass distributions in 
our optimal estimator, the performance for M m i n < 1O 12 M0 
remains worse than E predicted by the halo model. It is 
possible that noise in b and C from having too few modes is 
compromising our measurements of E opt in the Millennium 
simulation. This would most strongly affect the low-fc region 
where modes are scarcest, however the inflection does not 
exhibit this behavior. 

We conclude that we have reached limits of the assump- 
tion that halos are a Poisson sampling of an underlying halo 
field. For M » IO 12 /i _1 A/ , rj d w 0.66 so the effects of ex- 
clusion/mass conservation are beginning to matter, so it is 
perhaps not surprising that the optimal mass reconstruc- 
tion is poorly described by a model that presumes halos to 
be independently sampled from the halo field. We will defer 
to future work investigation of mass reconstruction in the 
presence of exclusion and other non-linear effects. 



3.7 Explicit test of the sampling model 

Are the observed covariance matrices of the halos consistent 
with their being stochastic discrete realizations of an under- 
lying "halo field" 5h, the model of EI2.4.1P We answer this 
question by asking how well the elements Cy of the simula- 
tions' covariance matrices can be fit by appropriate values 
of the Vi and fi. We find the {vi, fi} which minimize the L2 
norm of the residual to the model (1341) : 



||#C|| a = Y. (^ + Siifi/nP ~ Cy/P) 2 . (61) 
y 

(We use the mass power P instead of Pu in the fit, which 
slightly changes the values of Vi and fi without affecting the 



0.1 - 



II 

LU 

C/) 

03 

-£= 

o 
o 

■4— I 



0.01 




k [/7/Mpc] 



Figure 5. The optimal stochasticity E as a function of wave- 
number k from using all halos in the Millennium simulation is 
shown for two cases: the upper red curve treats each halo as a 
point mass, while the lower black curve uses the full spatial dis- 
tribution of the particles comprising the mass of each halo. If 
the halo catalog does not contain information on the variabil- 
ity of internal halo structure at a given mass, it cannot fully 
map the mass distribution. This significantly degrades E for 
k > 0.15/i Mpc -1 , but does not explain the inflection in E op t 
observed at fc < O.lAMpc" 1 for M < W 12 h- 1 M Q in Figure[2] 



quality of the fit.) Figure [6] plots the best-fitting values of 
fi and Vi for each halo mass bin in the NYU simulations, 
along with the bias bi. The fitted values of fi slightly exceed 
unity, but this is consistent with halos being a Poisson sam- 
pling (fi = 1) of a "halo field" because the mass- weighting 
within each halo bin will cause fi to rise slightly above unity, 
particularly in the most massive bin. 

The fit to the sampling model does confirm a departure 
from the simple biased-Poisson model, however, in that the 
biases Vi of the halos with respect to the parent halo field 
are not equal to the covariance biases bi with respect to the 
mass field. There is a divergence of b from v toward the 
trans-linear regime. 

The peak-background split model yields an analytic pre- 
diction fcpba for the bias of halos vs the underlying mass 
distribution. Figure [7] shows that for the NYU simulations, 
it is the bias v of the halos vs the halo field Sh that is best 
described by 6 p b s , not the bias b of the halos vs the mass dis- 
tribution. This observation should lead to a deeper theoret- 
ical understanding of the halo distribution. I n particular, it 
may h elp resolve the discrepancy reported bv lManera et al.l 
(|2010|) between 6 p b s and their measurements of halo bias, 
which were effectively what we call 6, rather than v. 

The quality of the fit of the sampling model to the co- 
variance matrix can be gauged by the ratio 



R = 



V¥c\ 



(62) 



This quantity measures the ratio of the RMS error in ele- 
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Figure 6. The best fit of the sampling model dj = ViVjP + 
Sijfi/iT-i to the 2 = halo catalog of the NYU simulation is 
shown as a function of k. The solid lines in the upper panel plot 
the bias &,; of halos in each mass bin, while the dashed lines are the 
best-fit Vi for each bin. Higher-mass bins have higher bias. Note 
that Vi < bi for massive halos, by an amount that grows at k > 
O.lAMpc , but that V{ > bi at lower masses, as predicted. The 
lower panel shows the best-fit fi. Poisson sampling would induce 
values slightly above /i = 1 because of the mass weighting within 
a halo bin, particularly for the most massive bin, as observed. 
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Figure 7. For the full NYU halo sample at z = 0, we plot three 
types of bias. The open symbols are the covariance bias b vs the 
mass distribution. The solid symbols are the bias v vs the under- 
lying continuum "halo field" Sh in the sampling model that best 
fits the halo-mass covariance matrix. The curve is the analytic 
bias prediction of the peak-background split model. 



4 PRACTICAL CONSEQUENCES 

With an optimal (linear) mass reconstruction algorithm in 
hand, we can examine strategies for conducting cosmological 
measurements with maximal efficiency. 



ments of C to the RMS value of the model — excluding the 
diagonal elements, which are always perfectly fit by adjust- 
ing the fi . For the NYU simulations at z = we find C well 
fit by this model: O.Of < R < 0.035 for all k < 0.25 ftMpc" 1 . 
For the bin at k — 0.07 /iMpc -1 in the NYU simulations, 
we would expect R w 0.02 from Gaussian sample variance 
in the estimation of the dj. At higher k, the sample vari- 
ance in the estimation of C should drop, so it is likely that 
by k = 0.2 the residuals of C to the sampling model are in 
excess of statistical errors. We expect effects such as halo 
exclusion and halo internal structure to induce departures 
from the simplest sampling models at scales approaching the 
halo sizes. 

The sampling model is not able to fit the C matrix 
across the full mass range of the z = Millennium halo cat- 
alog, yielding R > O.f. Excluding the two most massive of 
30 halo bins from C permits a solution with R ~ 0.07 in 
the linear regime, however this solution requires values of 
fi < 1 or even negative fi for the least massive halos. We 
take this as an additional sign that low-mass halos cannot 
be considered to populate the halo field independently, e.g. 
exclusion may be important. The much smaller volume of 
the Millennium simulation leads to larger statistical fluctu- 
ations in the elements of C, so at this time we refrain from 
detailed analysis of departures from the sampling model for 
less-massive halos. 



4.1 Power spectrum measurement with halo mass 
estimates 

We first posit a survey attempting to measure the shape 
of the matter power spectrum near k — 0.2 /iMpc -1 , e.g. 
a baryon acoustic oscillation measurement. Since the error 
in a power spectrum determination from N m modes of a 
stochastic estimator is ap/P = [(1 — E 2 )N m /2]~ 1 ' 2 , it is 
typical to consider a survey with E < 0.5 as sample- variance 
limited. This is equivalent to the criterion (nb 2 ) c ffP — 3. 
[Here we assume that the bias of the estimator is known or 
immaterial.] 

If the redshift of a halo can be obtained with a sin- 
gle spectrum of its central galaxy, then clearly the strategy 
for attaining a mass estimator with a given stochasticity 
with the fewest redshift measurements is to measure red- 
shifts for all halos above a chosen mass limit. In practice 
one could identify candidate halos from multicolor imaging 
data using a variant of red-sequence detection or cluster- 
finding with photome tric redshifts (e.g. iKoester et al.ll2007l ; 
iGladders fc Yeell2000l ). 

If imaging data can be used successfully to identify clus- 
ters and estimate their host halo masses, then the more ex- 
pensive spectroscopic redshift survey need target only the 
brightest member of each cluster or group to determine 
a redshift for the presumed dark matter halo. X-ray or 
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Sunyaev-Zeldovich survey data could also potentially con- 
tribute to halo finding and mass estimation. 

What is the minimum number of spectra that one must 
obtain to make the volume-limited power spectrum mea- 
surement described above? We answer this question for the 
case when halos are optimally weighted, and also for com- 
parison the (incorrect) prediction for halos that occupy the 
mass by the biased- Poisson model. We list in Table [2] the 
minimal mass of halos and minimal number density of ha- 
los for the above two cases in three different redshifts. The 
optimally weighted case needs a factor 4-12 fewer spectra 
than predicted by the biased-Poisson model to achieve the 
sample- variance limit E < 0.5. 

To complete a volume-limited survey for < z < 1 that 
achieve (nb 2 ) cB P — 3 at the BAO scale k ~ 0.2 /iMpc -1 , the 
total number of spectra one needs is w 6 x 10 6 / s k y if halos are 
optimal weighted. If the biased-Poisson model were correct, 
one would require ~ 33 x 10 6 / a ky, a factor of 5 more. For 
a redshift survey to z < 0.7 such as the ongoing SDSS-III 



Baryon Oscillation Spectroscopic Survey (BOSSjJ, the two 
cases yield 2 x 10 6 / s k y and 14 x 10 6 / s k y , respectively. Hence 
BOSS at / s ky = 0.25 would require only 500,000 optimally 
targeted and weighted redshifts to achieve (nb 2 ) c sP = 3, 
while the survey plans to obtain 1.5 million redshifts. Given 
that precise halo masses are not easily accessible, the weight- 
ing for a real survey may be sub-optimal. We will show in 
Section 14.31 that if one weights halos by the numbers of 
LRGs, a factor of 3 more redshifts are required to attain 
E < 0.5. 



4.2 Bias calibration with weak lensing 

A more ambitious measurement is to cross-correlate the 
mass distribution estimated from a galaxy redshift survey 
with a weak gravitational lensing shear map, thereby cali- 
brating the bias of the estimator ([Penll 20041 ) ■ Measures of the 
redshift dependence of the cross-correlation between lensing 
and matter can also strongly c onstrain the curvature and 
D(z) function of the Universe IjBernstein fc Jainl 120041 ). A 
simplified analysis of these problems considers the covari- 
ance matrix between the gravitational convergence field k 
and the galaxy-based mass estimator g to be 



C = 



P + N K 
bP 



bP 

b 2 P + M q 



(63) 



where J\f K is the noise in the weak lensing mass estimation. 
To fully exploit the lensing data, the mass estimator should 
attain E~ 2 -1 + b 2 P/J\f g > P/JV K such that the S/N ratio 
per mode of the mass estimator exceeds the S/N ratio per 
mode of the lensing map. 

If the lensing noise level M K is known and one is infer- 
ring P, b, and Af g from the values of the lensing and mass 
estimators in N m modes of the sky, then the marginalized 
error in the estimate of b becomes 



11 
b 



(1 + N K /P) (l/(nb 2 ) eS P + Af K /P) + {Af K /P) 2 



N„ 



(64) 
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Figure 8. The vertical axis is the number of modes N m that must 
be measured for a 1% measure of the bias of an optimal mass es- 
timator using the cross-correlation of lensing and galaxy redshift 
survey. It is plotted vs the density of sources in the redshift sur- 
vey. The solid lines are for an optimal survey which targets one 
galaxy in each halo above some mass limit. The dashed lines as- 
sume that target galaxies are predominantly selected from halos 
with b as 1. The dashed blue line follows N m oc n -5 ' 6 , illustrating 
that the total number of galaxies oc nN m required in the optimal 
survey is a weak function of survey depth. 



When the lensing and mass reconstruction both have high 
S/N per mode, this becomes 



£6 
6 



N K /P + E 2 
N m 



(65) 



http://www.sdss3.org/cosmology.php 



As an example, consider a lensing source plane consisting of 
n = 30 galaxies arcmin" as z s — 1 being cross-correlated 
with a transverse mass mode at fc = 0.1/i/Mpc at z — 0.5, 
near the peak of the lensing kernel. The shear signal will 
appear at multipole I = kD w 130 where the total power in 
the shear signal is Ce ~ 8 x 10 -8 . The shape noise power 
is a 2 /n k2x 10" 10 , giving P/M K « 400, or a S/N per 
mode of ~ 20. The cosmological measurements will hence 
continue to benefit from higher halo survey density until 
E <C 0.05. Even with optimal halo weighting this does not 
occur until M m i n < 1O 12 M0. In most of the relevant regime, 
the optimal mass weighting require 10 or more times fewer 
surveyed redshifts than the Poisson formulae would have 
suggested. 

Given the high source density required to saturate the 
accuracy in b, we ask whether it is more efficient to conduct 
a deep survey or a shallower survey covering more sky and 
hence more modes. Figure [8] plots the number of modes that 
must be observed to determine b to 1% accuracy, vs the space 
density n of halos surveyed, as we lower M mln of the survey. 
We find N m oc n -5 ' 6 describes the results well. The total 
number of redshifts to be obtained in the survey is nN m oc 
n 1 ' 6 , hence this measure of the survey's expense depends 
very weakly on depth. We also note that the source density 
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Table 2. The number density of halos n° pt and the corresponding minimal halo mass M^? needed to achieve (nb)^P 
at BAO scales (k = 0.2/iMpc - ) when applying the optimal weighting. For comparison, the minimal halo mass M p , 



-- 3 (E = 0.5) 
and number 



density n p needed to achieve the same accuracy in a Poisson model are also listed. The last column gives the ratio of required redshift 
measurements in the two models. 
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required for the bias measurement is nearly independent of 
redshift using the optimal survey strategy. In contrast, a 
survey of number-weighted emission-line galaxies, plotted 
with the dashed lines, requires > 15 x higher n than the 
optimal strategy at z = 0, and galaxy densities that increase 
to z — 1. 



4.3 Galaxies as weights 

Most measurements of large-scale structure to date have 
used galaxy densities as mass estimators. A spectroscopic 
survey of galaxies can be thoug ht of as weighting halos b y 
the number of galaxies per halo IScoccimarro et al.l 1)20011 ) . 
The halo occupancy distribution (HOD) gives the probabil- 
ity p(N\M) of finding N galaxies in a halo of mass M. 

The use of galaxies as mass tracers will be inferior to 
an optimal halo-weighting scheme, in the sense of having 
higher stochasticity E for a given number of redshifts, for 
three reasons: 

(i) Ideally only one redshift per halo is needed, so a galaxy 
survey is in some sense wasting spectroscopic resources if 
more than one galaxy is targeted per halo. 

(ii) The mean HOD g{M) = (p(N\M)} may not match 
the optimal halo weighting. 

(iii) The occupancy of a given halo is an integer drawn 
from the HOD, which adds a source of stochasticity to the 
weight assignment and can propagate into increased stochas- 
ticity in the mass estimation, even if the mean HOD is a close 
to the optimal weight. 

The first penalty is typically not large: the HOD is typically 
divided into the probability / C (M) of having a single cen- 
tral galaxy in the halo, plus a distribution p(N s \M) of the 
number N 3 of satellite galaxies. The latter is typically taken 
to be a Poisson distribution with a mean N 3 (M). For most 
galaxy samples, the fraction of satellites is 10-20%, a small 
perturbation to the number of redshift targets. 

This also implies, however, that w 90% of the halos 
detected in a survey are occupied by a single target galaxy, 
and hence given equal weight. We are then drawn to the 
second issue: does the mean HOD g(M) serve as a nearly- 
optimal weight function? In Figures [T] we plot the mean 
number of blue galaxies (in cyan lines) and LRGs (in orange 
lines) in each halo as a function of halo mass, as determined 
from SDSS data. 

For a simple HOD in which f c is a step function, the 



mean HOD is 



g{M) 



1 -- (107) i/ -»'■■ 







M < M . 



(66) 



Mi, cv, and the cutoff Mo are dependent upon the luminosity 
cut or other criteria used to define the galaxy sample. For 
luminosity- limited samples, typical values are Mi ~ 20A/o 
andasil (|Zehavi et aij|200al20"ich . 

But note the similarity of this mean HOD functional 
form to Equations (144 [) and (|56|l for the optimal weight. The 
optimal weight for a catalog of halos with mass M > M m i n 
is very well approximated by a function of the form w(M) = 
1 + (M//3M m i n ) a , where a and /3 depend on M m i n . For low 



M n 



a « 1, with a decreasing at higher M m i n . For M u 



in the range 10 -10 i<s M s at z = 0, values of 3 < /? < 9 
yield the least stochasticity, with E values indistinguishable 
from the optimal weighting. 

Hence by a useful coincidence, the mean HOD for 
luminosity-selected galaxies bears close resemblance to an 
optimal halo weighting function, except that the HOD tends 
to have a longer flat low-mass plateau (/3 w 20) than the 
optimal weight (J3 w 4). H10 noted that the optimal weight 
function they derived is well approximated by equation (|66p , 
with /3 ~ 3, but they did not make the connection to galaxy 
HODs. 

How far from optimal are the mean HODs? We examine 
the case of luminous red galaxies (LRGs) first. We obtain 
th e mean num b er of LRGs from the HOD fitting results 
of IZheng et al.l (J2009J), using equation (B3) in their paper 
to model the dependence of model parameter on as. The 
mean HOD is shown in orange in Figure |T]). Figure [9] plots 
the stochasticity E vs the space density n of target halos 
at fc = 0.1/iMpc" 1 and z — 0, with the solid black line 
showing the best possible result from optimal targeting and 
weighting of halos (M m i n is an implicit parameter for the 
black curve) . The dashed red line shows the result of using 
the mean LRG HOD as a halo weight. A subtlety is that 
the LRG HOD does not have a step-function cutoff — the 
probability f c of a central galaxy follows an error function 
and hence has no single well-defined M m i n . The red dashed 
line shows the result of varying a low-mass cutoff applied 
to the LRG HOD. We find that the E vs n behavior is 
within 10-20% of the optimal result as long as we cut out 
halos with f c < 0.1. For LRGs, at least, the mean HOD is 
therefore a good choice of weight function. We will examine 
other galaxy classes below. 

To examine the impact of item (iii) , the stochasticity of 
the HOD, on mass-estimator performance, we populate the 
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halos in the Millennium simulation with LRGs as per the 
HOD prescription. We place central galaxies in the speci- 
fied fraction of halos, plus a Poisson-distributed number of 
satellite galaxies in the halos which host a central galaxy. 
The red triangle in Figure [9] shows the stochasticity E vs 
the number density of LRGs. We find stochastically occu- 
pied halos achieve a factor of two higher (worse) E than 
the deterministic weighting by the mean HOD. While the 
stochastic LRG HOD requires ~ 3x as many redshifts to 
reach E = 0.5 as an ideal survey would require, it is sim- 
ilar to what one would predict from an optimal survey if 
the biased-Poisson model were a correct description of halo 
stochasticity (dashed black line). 

The green solid curve in Figure [9] shows the result 
of weighting the halos by the number of galaxies drawn 
from HODs w i th va rying minimum galaxy luminosities, from 
IZehavi et al.l ([2010). The behavior is similar to the LRG 
HOD: even though the mean number of galaxies per halo 
looks like the optimal weight, using the actual number of 
galaxies as weight results in larger E. The randomness in 
the number of galaxies in each halo introduces additional 
stochasticity that degrades the mass estimator. 

If galaxies are to be used to provide optimal reconstruc- 
tions of the mass, then they must themselves be weighted 
in some way so as to reduce the stochasticity in the weight 
applied to halos of a given mass. Determining the optimal 
mark is an interesting problem for the future. Formalism for 
treating this more general problem has been developed in 
IShetbl (120051 ). and can be used directly, but is beyond the 
scope of this work. 

Our results suggest that it is interesting to study 
how best to supplement the spectroscopic galaxies with 
a larger, deeper photometric-redshift sample. The spectro- 
scopic galaxies can be weighted by the number of photo-z 
galaxies consistent with sharing the same halo. The deeper 
photo-z catalog potentially has lower stochasticity in halo 
mass estimates. 



4-3.1 Surveys with blue or emission-line galaxies 

We have seen that the mean HODs for LRGs and luminosity- 
selected galaxies are good approximations to the optimal 
weight, but the stochasticity in halo occupation degrades 
E for a given source density n. Galaxy redshift surveys 
based on emission line detection will likely result in sub- 
stantially different halo weightings, so we investigate the 
mass-reconstruction performance of such a survey relative 
to an LRG survey or optimal weighting. 

We model the emission-line sample by starting with the 
mean HOD for blue galaxies given b y equations (10) and (11) 



1.0 



mean riUD tor blue galaxies given b> 
and Table 4 in (|Zehavi et alj|2005j ): 



guue{m) 



m 

~Mb~ 



+ o.7e~ [21os(M/lol2,l ~ lM0)]2 (67) 



where Mb = 7 x 10 1 3 /t 1 M Q and a B - 0.8 (following 
ISheth fc DiaferiolbOOlT ). This is plotted as the cyan line in 
Figure [T] There is a bump in the number of blue galaxies be- 
tween ~ M 11 h~ 1 M to ~ M 12 h~ 1 M Q , which is very differ- 
ent from the optimal weight. The outcome of E from weight- 
ing halos in the Millennium simulation according to galaxy 
counts drawn from this HOD is shown in the blue triangle of 
Figure ([9}. Although the blue galaxy sample achieves lower 
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Figure 9. Stochasticity E as a function of the number density of 
redshifts obtained in the survey.. The black lines plot the result 
of an ideal mass-reconstruction strategy, in which one redshift is 
obtained for each halo above a cutoff mass, and an optimal w(M) 
is applied to each surveyed halo. The dotted black curve plots 
the stochasticity one would expect from this strategy if halos 
were a biased Poisson sampling of the mass — the optimal sur- 
vey requires significantly fewer redshifts than predicted by the 
Poisson model for the same E. The green line gives E resulting 
from weighting each halo by the number of galaxies above a lumi- 
nosity cut — assuming galaxies occupy halos as per IZehavi et al,l 
IJ2010T ). The blue and red triangles result from galaxy- weighting 
using the HODs for blue and luminous red galaxies, respectively, 
in the SDSS. The LRG and luminosity-cut surveys are worse than 
optimal primarily because of randomness in the halo occupancy; 
the dashed red line shows the result of eliminating this random- 
ness by weighting each halo with the mean HOD. Blue galaxies 
are very inefficient for reconstructing the mass; if emission-line 
spectroscopic galaxies are a random 10% or 1% subsample of the 
blue population, they are also inefficient, not even attaining the 
E = 0.5 required for a volume-limited power spectrum measure- 
ment. 



E than the LRG does, notice that it requires 100 x more red- 
shifts, i.e., is lOOx more costly than an optimally weighted 
sample. If we were to under-sample the blue galaxies — e.g. 
by obtaining redshifts for ten percent, or one percent of the 
sample with the brightest emission lines — then E would rise 
to 0.54 and 0.86, respectively, if the sub-sampling rate is in- 
dependent of halo properties. Clearly, emission line samples 
are a very inefficient way to reconstruct the mass, although 
this disadvantage is countered by the fact that emission lines 
can be much stronger and easily detected relative to LRG 
absorption features. Optimization of a survey would need to 
weigh these effects. 



4-3.2 Surveys that under-weight massive clusters 

Galaxies in massive clusters tend to be strongly deficient in 
21-cm emission and other gas-phase emission lines. A red- 
shift survey selected by such criteria will tend to miss or 
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Figure 10. We plot the stochasticity of an optimal mass esti- 
mator created from a catalog of halos in the Millennium sim- 
ulation in the mass range 10 12 /i — 1 Mq < M < M max vs the 
upper limit M max of the halo catalog. This demonstrates that 
one cannot improve the stochasticity of the estimator below some 
floor unless one detects (and heavily weights) the rare halos above 
« 10 14 h~ 1 M P) . 



under- weight the most massive clusters. In the Figure 1101 
we show that the absence of high-mass halos from a catalog 
sets a floor on the attainable stochasticity even with optimal 
mass estimation. When halos with ~ 10 12 /i _1 Mq < M < 
M max are detected by the survey, we find that E fa 0.5 is 
achievable at z — 0, k — 0.2/iMpc _1 without detecting mas- 
sive halos. However, cosmological inferences requiring lower 
values of E require a means of detecting and appropriately 
weighting clusters above 10 1 Ii~ 1 Mq. Surveys without the 
ability to identify massive clusters reach a limit in E that 
cannot be improved by addition of more low-mass halo de- 
tections. 

The floor on E can be estimated using equation (|47[) . 
Consider the case when our halo catalog only contains ob- 
jects below some rrid, so the "dust" is the mass in massive ob- 
jects. Then £? . rijVj can be large, so E po i B — > 0. Mm is dom- 
inated by the most massive objects: at z = 0, about 90% of 
M m is contributed by objects with masses above 10 14 /i _1 M Q 
(we have assumed as = 0.9). If these are missing from the 



catalog, then Ad — > Mm, making E opt 
a result, on scales of k ~ 0.1/iMpc - , 
smaller than about 0.25. 



(1 + P/Mm 



-1/2. 



E op t cannot be made 



5 CONCLUSION AND DISCUSSION 

We have determined the weighting scheme that minimizes 
the stochasticity between the linearly weighted halo field 
and the associated mass density field. The optimal weight 
function depends on the mass range of the halo catalog, how 
much mass is missing from the halo catalogue, and how the 



halos cluster. (I.e., we showed how the weight is modified 
by a halo- mass dependent selection function.) We show that 
neither mass weighting nor bias weighting of the halos is op- 
timal. The first principal component of the halo covariance 
matrix C is also usually not the optimal, although H10 show 
that the weakest PC of the altered noise matrix C — bPb T is 
a good approximation to an optimal weight when the num- 
bers of halos in all bins are equal. Rather, we demonstrate 
that, under very general circumstances, the optimal weight 
will be a mix of bias weighting and mass weighting, simply 
because the mass is comprised of the mass-weighted halo 
catalog plus the mass in the "dust bin" of structures below 
the halo detection threshold. 

The halo model can generally give a reasonably good 
description for the optimal weight function and its associ- 
ated stochasticity with two important alterations: first, it is 
necessary to treat the halos as though they sample a contin- 
uous "halo field" that is distinct from the mass distribution, 
and that they have a bias v(M) with respect to the halo 
field that differs from the bias b(M) with respect to mass. 
Second, halo catalogs extending below 10 12 h -1 Mq do not 
reconstruct the mass as well as the halo model predicts. 
However, we find that the model generally overestimates 
the optimal stochasticity, even on the large scales where one 
might have expected to find good agreement. We suspect 
this is due to the combined effects of non-linear bias and 
halo exclusion, which our treatment currently ignores. 

We also note that the randomness in the halo 
shapes at fixed mass — i.e. ellipticity, concentration, and/or 
substructure — introduces stochasticity into a mass estima- 
tor built from a halo catalog in which the halo profiles are 
reduced to points, setting a lower limit on the attainable E. 
In the Millennium catalog, this structure stochasticity sub- 
stantially degrades the mass estimation for k > 0.2h Mpc - . 
Since information about halo shapes is difficult to obtain in 
observations, the lower limit on E from point-like halos in 
simulations is also the best one can achieve in real observa- 
tions - although, because galaxies are expected to be rea- 
sonably faithful tracers of halo profiles, it may be that they 
can be used to further reduce E into the non-linear regime. 

An optimally weighted halo catalog can have an ef- 
fective number density (nb 2 ) B g up to 15 x better (higher) 
than one would have predicted for the same halo catalog 
in a biased-Poisson model of halo stochasticity. This gain 
means that a volume-limited measurement of the linear- 
regime power spectrum of matter for the entire observable 
2 < 1 universe could in principal be accomplished with 
only 6 million spectroscopic redshift measurements. Such a 
program would require outside information, perhaps a deep 
imaging photo-z survey, to identify halos and provide reli- 
able mass estimates or marks to apply to spectroscopic tar- 
gets. (See H10 for an estimate of the effect of mass-estimator 
degradation due to a generic log-normal error distribution 
in the estimation of halo masses.) 

We use halo occupancy distribution models to estimate 
the stochasticity E resulting from more traditional surveys 
which apply uniform weights to targeted galaxies. The mean 
HODs for luminosity-thresholded samples and LRGs are re- 
markably useful approximations to the optimal weight func- 
tions. However, additional stochasticity is introduced into 
the mass estimator by the random variations in halo occu- 
pancy about the mean. Hence luminosity-selected or LRG 
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catalogs require w 3x more redshifts to reach a given E 
than a survey with perfect knowledge of halo masses for 
which the optimal weighting can be applied. In contrast, 
HODs for blue or emission-line galaxies do not resemble the 
optimal weights, and hence require « 100 x more redshift 
measurements than an optimally-weighted survey to obtain 
a given E. Random sub-sampling such galaxies to the same 
space density as LRGs yields E values that are 2x larger 
than those for LRGs. We also find that low-mass halos can- 
not reconstruct the shot noise contributed by the massive 
halos, setting an upper limit on the fidelity of the mass re- 
construction for surveys that fail to identify the most mas- 
sive clusters. 

Application of optimal halo weighting can be even more 
beneficial for studies of cross-correlation between gravita- 
tional potential (i.e. mass) and other cosmological signals, 
since these experiments gain rapidly as the stochasticity E 
drops below the E = 0.5 need ed to make volum e- limited 
power-spectrum measurements. IPark et al.l (|2010j), find, for 
example, that mass weighting halos can greatly accuracy in 
estimation of gravitational potential if the halo catalog ex- 
tends down to ~ 1O 13 /i -1 M0 or lower. 

Applying the optimal weight is obviously efficient in re- 
ducing noise in the estimation of BAO from power spec- 
tra and in cross-correlation cosmological tests. In future 
work we will extend this study to the use of redshift space 
distortions to measure the growth rate of structure (e.g. 
lOkumura fc Jind 120101 ). We are also investigating the po- 
tential for galaxy marking and non-linear mass estimators 
to further improve the ability to trace large-scale structure 
with observational data. 
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